This is an R notebook which uses Stan, a probabilistic programming language, to fit Bayesian models. This notebook is meant to serve as an example of reproducible research, by fully documenting an analytical workflow, and also to demonstrate some best practices in Bayesian model assesment.
The analysis presented below is based on Rand, Arbesman, and Christakis (2011) (available here) where the authors consider the role of social networks in the development of large-scale cooperation among individuals in an economic game. The authors use a logistic regression model to examine individuals’ decisions (cooperation or defection) and show that social networks which can be frequently updated by participants (rather than fixed throughout the course of the game or randomly updated) foster cooperative decisions in this setting.
Although the code and specific models provided here are particular to this dataset, one goal of this notebook is to provide a general outline of a “good” workflow for Bayesian analyses. In different settings or with different data, each section outlined here might be implemented differently (for example, adding an imputation model to deal with missing covariate data or utilizing a more complex model). However, we argue that by documenting the complete workflow – from data collection to answering research hypotheses to model checking – as we have done here, research becomes more reproducible and, additionally, cross-study comparisons can be more easily tackled.
This notebook uses the following R packages:
Stan to fit Bayesian modelsStan modelsand was built using R version 3.4.0. Please be sure that all packages (and dependencies) listed above are installed and that the version of R on your machine is up to date.
To increase transparency, we aim to document a (perhaps simplified) workflow in its entirety, from the generation of research hypothesis through data collection and cleaning to modeling techniques all the way through uncertainty evaluation and prediction.
Generally, Rand, Arbesman, and Christakis (2011) are interested in considering the role of a social network in fostering cooperative actions among a group of individuals. Although cooperation is beneficial for large groups, at the individual level, cooperative actions often leave the individual exposed to exploitation. Some evolutionary game theory models suggest that the social network of individuals can help explain the proliferation of cooperative strategies. See Rand, Arbesman, and Christakis (2011) for more details.
Previous empirical research has only considered fixed, non-dynamic social networks. The authors argue that implementing dynamically-updated networks allows for a new type of conditional action, where individuals may change their decision to cooperate based on changes in the network structure.
The authors hypothesize that implementing frequent user-updated social networks will more strongly promote cooperation, relative to other conditions.
In their paper, the authors discuss other interesting hypotheses, but for demonstration we will focus only on this particular hypothesis.
Subjects were recruited using the online labor market Amazon Mechanical Turk (see the Supporting Information for Rand, Arbesman, and Christakis (2011) for more details).
The authors randomly assigned 785 participants to one of four conditions (see below) in a series of 40 realizations of network experiments (average network size = 19.6; SD = 6.4). In all conditions, subjects play a repeated cooperative dilemma (each game/session consisted of ## rounds) in an artificial social network created in the virtual laboratory:
Additionally, the experimenters implemented a variety of different conditions/settings for the behavior of the network in the game:
This data has been made publicly available (here), which greatly improves reproducibility.
Below, we display 50 randomly selected rows from this dataset:
We should check for any missing or implausible values in the imported data set:
## sessionnum condition playerid decision..0.D.1.C.
## Min. : 1.00 Fluid :1034 Min. : 102 Min. :0.0000
## 1st Qu.: 9.00 Random :1015 1st Qu.: 918 1st Qu.:0.0000
## Median :19.00 Static :1008 Median :1915 Median :1.0000
## Mean :21.83 Viscous: 819 Mean :2196 Mean :0.5325
## 3rd Qu.:34.00 3rd Qu.:3407 3rd Qu.:1.0000
## Max. :52.00 Max. :5240 Max. :1.0000
##
## previous_decision round_num num_neighbors group_size
## Min. :0.0000 Min. : 1.000 Min. : 0.000 Min. :10.00
## 1st Qu.:0.0000 1st Qu.: 2.000 1st Qu.: 3.000 1st Qu.:15.00
## Median :1.0000 Median : 3.000 Median : 5.000 Median :19.00
## Mean :0.5361 Mean : 3.863 Mean : 5.239 Mean :20.11
## 3rd Qu.:1.0000 3rd Qu.: 5.000 3rd Qu.: 7.000 3rd Qu.:25.00
## Max. :1.0000 Max. :11.000 Max. :18.000 Max. :33.00
## NA's :785
## fluid_dummy
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2668
## 3rd Qu.:1.0000
## Max. :1.0000
##
The dataset imported here has already been cleaned for this analysis. The authors chose to drop inactive participants from this dataset (see the Supporting Information for Rand, Arbesman, and Christakis (2011) for more details).
We present a very brief example of some exploratory data analysis tasks. In practice, this section could be fleshed out much more.
Here are some of the statistics for this data (generated in-line):
Each session is assigned a particular network link updating strategy (condition) and may contain a different number of players. Below is the summarized session information:
Note that the number of players vary both within and between experimental conditions:
In the following sections, we consider how players change their behavior over the course of a game.
In this analysis, we focus on fitting Bayesian versions of the logistic regression models implemented in Rand, Arbesman, and Christakis (2011). In their paper, the main model focuses on the round number and condition used for network link updating (i.e., static, random, viscose, or fluid). This model also accounts for dependence across decisions made by the same player as well as across players in the same session by incorporating random effects at the individual- and session-level
If we let \(y_{ijk}\) be the decision made by player \(j\) in the \(i\)th round of session \(k\) so that \(y_{ijk}=1\) if player \(j\) chooses to cooperate in round \(i\) of session \(k\) and \(y_{ijk}=0\) otherwise. Then our model is simply \[\begin{align*} Y_{ijk} & \sim \text{Bernoulli}(p_{ijk}) \\ \text{logit}(p_{ijk}) &= \beta_1 + \beta_2 x_{1jk} + \beta_3 x_{2k} + \beta_4 x_{1jk} x_{2k} + \tau^{(indiv)}_i + \tau^{(sess)}_{k} \\ \tau^{(indiv)}_i & \sim \text{Normal}(0,\sigma^{(indiv)}) \\ \tau^{(sess)}_k & \sim \text{Normal}(0,\sigma^{(sess)}) \end{align*}\]where \(p_{ijk}\) is the probability that player \(j\) cooperates in round \(i\) of session \(k\), \(x_{1jk}\) is the round number and \(x_{2k}\) is an indicator for the network behavior setting of the \(k\)th session, with \(x_{2k}=1\) if the \(k\)th session has fluid network link updating and \(x_{2k}=0\) otherwise. \(\sigma^{(indiv)}\) and \(\sigma^{(sess)}\) capture the variation within players and sessions, respectively.
Note that the main conclusion from Rand, Arbesman, and Christakis (2011) states that rapidly updating networks promote cooperation relative to other conditions, which is confirmed using a one-sided hypothesis test for \(\beta_4\). In the paper, the authors provide an estimate, \(\hat{\beta}_4=0.135\), with a small p-value, \(0.006\).
In the following sections, we demonstrate how to fit this model and perform the hypothesis test using Stan.
No variable transformations are necessary for this analysis. Round number is an integer, and we use a dummy indicator for the fluid condition, e.g.
cooperation$fluid_dummy <- 1 * (cooperation$condition == "Fluid")To keep track of players and sessions, we create index tables for each, so that they can be easily renumbered:
playerIDs <- data.frame(id = unique(cooperation$playerid), index = 1:length(unique(cooperation$playerid)))
indivIndex <- unlist(lapply(1:nrow(cooperation), function(i) {
playerIDs$index[playerIDs$id == cooperation$playerid[i]]
}))
sessionIDs <- data.frame(id = unique(cooperation$sessionnum), index = 1:length(unique(cooperation$sessionnum)))
sessIndex <- unlist(lapply(1:nrow(cooperation), function(i) {
sessionIDs$index[sessionIDs$id == cooperation$sessionnum[i]]
}))First, we can specify our model directly in R, as shown below, or save it using a text editor as separate .stan file. The code provided below was created with readibility and ease of interpretation in mind, for more details about to write efficient model code in Stan, see the documentation (e.g., Section 9.9 in the Language Manual).
logit_model <- '
data {
// Dimensions
int<lower=1> N; // Number of observations
int<lower=1> p; // Number of parameters
int<lower=1> N_indiv; // Number of individuals
int<lower=1> N_sess; // Number of sessions
// Indices
int<lower=1,upper=N_indiv> indiv[N];
int<lower=1,upper=N_sess> sess[N];
// Outcome
int<lower=0,upper=1> y[N];
// Covariates
real x1[N];
real x2[N];
}
parameters {
real beta[p];
real tau_indiv[N_indiv];
real tau_sess[N_sess];
real<lower=0> prec_indiv;
real<lower=0> prec_sess;
}
transformed parameters {
real<lower=0> sigma_indiv;
real<lower=0> sigma_sess;
sigma_indiv = sqrt(1/prec_indiv);
sigma_sess = sqrt(1/prec_sess);
}
model {
for (i in 1:N){
y[i] ~ bernoulli_logit( beta[1] + beta[2]*x1[i] +
beta[3]*x2[i] + beta[4]*x1[i]*x2[i] +
tau_indiv[indiv[i]] +
tau_sess[sess[i]] );
}
tau_indiv ~ normal( 0, sigma_indiv );
tau_sess ~ normal( 0, sigma_sess );
beta ~ normal( 0, 100 );
prec_indiv ~ gamma(1, 1);
prec_sess ~ gamma(1, 1);
}
'RFirst, we package the necessary data for use with Stan, in a list format:
dat <- list(N = nrow(cooperation), p = 4, y = cooperation$decision..0.D.1.C.,
x1 = cooperation$fluid_dummy, x2 = cooperation$round_num, indiv = indivIndex,
N_indiv = max(indivIndex), sess = sessIndex, N_sess = max(sessIndex))Then, we can run the model in R.
resStan <- stan(model_code = logit_model, data = dat, init = list(list(prec_indiv = 1,
prec_sess = 1)), chains = 1, iter = 3500, warmup = 500, thin = 1)Convergence of the MCMC algorithm can be monitored and inspected (note that the gray shaded areas represent the warmup or burnin period).
## Show traceplot
traceplot(resStan, pars = c("beta", "sigma_indiv", "sigma_sess"), inc_warmup = TRUE)rstanarm packageRather than specify the model code (or a .stan model file) ourselves, we could also make use of the rstanarm package which provides wrapper functions for Stan for a variety of common models, including logistic regression.
cooperation$decision = cooperation$decision..0.D.1.C.
glm.out = stan_glmer(decision ~ fluid_dummy + round_num + fluid_dummy * round_num +
(1 | playerid) + (1 | sessionnum), data = cooperation, family = binomial(link = "logit"),
chains = 1, iter = 1000)## trying deprecated constructor; please alert package maintainer
We can easily extract posterior samples from the model and look at posterior means for the regression coefficients:
## intercept fluid_dummy round_num fluid_dummy*round_num
## [1,] 1.1738 0.0777 -0.2095 0.0374
Below, we plot our draws from the marginal posterior distributions for the model parameters. Our posterior estimates (sample means) are indicated by the vertical red lines. Because we are often interested in finding non-zero effects, dashed vertical blue lines are plotted at the origin.
Recall that the authors are interested in testing the hypothesis of whether or not frequent user-updated social networks promote cooperation, relative to other conditions. This can be directly tested by testing for a positive \(\beta_4\) (posterior mean = 0.0374. In our Bayesian setting, this means that we are interested in the posterior probabilty that \(\beta_4\) (given our model and the observed data) is greater than zero. We can estimate this probability using the posterior samples from our model to calculate the proportion of draws from the posterior distribution, \(p(\beta_4|y)\), that are greater than zero:
sum(list_of_draws$beta[, 4] > 0)/length(list_of_draws$beta[, 4])## [1] 0.768
Thus, our model provides evidence in support of the authors’ claim: frequently user-updated social networks do promote cooperation, relative to the other conditions.
A reasonable way to consider whether the model fits the data (i.e., captures or addresses the important features of the observed data) is to consider whether simulations from the model “look like” the observed data. In this Bayesian setting, we can simply examine draws from the posterior predictive distribution, which is obtained by feeding posterior draws for model parameters back into the model to simulate predicted outcomes. In the following sections, we consider a variety of perspectives to address whether simulations from the model look like the observed data. See Gelman et al. (2014 Chapter Six) for further motivation and more details.
If the model fits well, then replicated data generated under the model should look similar to the observed data. We can check this by obtaining posterior predictions for the observed data. We simply need to add the following few lines to our Stan model code:
"generated quantities {
vector[N] y_tilde;
for (i in 1:N)
y_tilde[i] = bernoulli_logit_rng( beta[1] + beta[2]*x1[i] +
beta[3]*x2[i] + beta[4]*x1[i]*x2[i] +
tau_indiv[indiv[i]] +
tau_sess[sess[i]] );
}"Then, we can fit this model just as before:
postPredStan <- stan(model_code = logit_postPred, data = dat, init = list(list(prec_indiv = 1,
prec_sess = 1)), chains = 1, iter = 3500, warmup = 500, thin = 1)and easily extract the results:
which, by default, excludes the warm-up or burn-in samples and permutes the ordering of the remaining posterior samples.
Because we are concerned with individual-level behavior, to help visualize the data we first organize it into a matrix where each row corresponds to a unique player and each column represents a round of play (across all conditions, there are up to 11 rounds). The plot below displays this matrix as an image, so that we can examine the decision history, cooperate (colored bar) or defect (grey bar) for all r max(unique(cooperation$playerid)) players in each round.
Now we can compare this plot for the original (true) data to realizations from the posterior distribution from our model:
## using data from the first layer
## Removing the following cognostics that are all NA: value_mean
In this case, there doesn’t appear to be any obvious differences between the true data and our posterior predictions.
In this setting, the large number of individual participants can make it difficult to spot differences in the data. Instead, we can use summary measures or trends to compare our posterior predictions to the original data.
Below, we average individual behavior by round (recall from the plots above that later rounds typically have far fewer participants) and plot the the average cooperation level by round:
Recall that we are not modeling each session type individually, but have only included an indicator for fluid vs. non-fluid sessions in our model. Observed variation across predictions for non-fluid sessions is a result of the estimated session-level random effects.
Since our focus is on the performance of the fluid network setting relative to all other settings, we can consider collapsing all non-fluid rounds together in order to make the comparison clearer:
We can also consider players’ tendency to switch between cooperating or defecting in the posterior samples (recall that we looked at these proportions in our exploratory analysis), by averaging over all rounds and players.
Again, we the truth (top left plot) is hard to distinguish among posterior samples.
Another perspective could consider game sessions individually.
Had we seen any inconsistencies between our posterior predictions and the observed data, we could consider developing formal test quantities to further examine these differences. For example, say that we had noticed a difference in the proportion of times individuals move from cooperation to defection across rounds (i.e. differences in the height of the C –> D bars across the above barplot. We could use the observed proportion in each draw or sample from the posterior predictive distribution to test whether or not there is any systematic difference between our posterior predictions and the observed data, in this respect. Of course, other tet quantities could certainly be developed. For more discussion, see Gelman et al. (2014 Chapter Six).
In this section, we use bootstrapped data sets (resampling, with replacement, rows (individual-round outcomes) from the original data set) to evaluate model fit.
Here, we simply treat each row as an individual data point, an outcome (cooperate or defect) for a particular individual in a particular round.
if (runModels) {
logit.bootstrap <- function(data, indices) {
d <- data[indices, ]
fit.glm <- glm(decision..0.D.1.C. ~ fluid_dummy * round_num, data = d,
family = "binomial")
return(coef(fit.glm))
}
logit.boot <- boot(data = cooperation, statistic = logit.bootstrap, R = 30) # R: number of samples
fileName <- "./lib/logit.stan"
stan_code <- readChar(fileName, file.info(fileName)$size)
cat(stan_code)
logit.stan.bootstrap <- function(data, indices) {
d <- data[indices, ]
dat <- list(N = nrow(d), p = 4, decision = d$decision..0.D.1.C., fluid_dummy = d$fluid_dummy,
round_num = d$round_num)
fit.stan.logit <- stan(model_code = stan_code, data = dat, chains = 3,
iter = 3000, warmup = 500, thin = 10)
list_of_draws.bootstrap <- extract(fit.stan.logit)
return(list_of_draws.bootstrap$beta[dim(list_of_draws.bootstrap$beta)[2],
])
}
logit.stan.boot <- boot(data = cooperation, statistic = logit.stan.bootstrap,
R = 3) # R: number of samples
}However, sampling each row of the data set completing independently ignores the fact the players are grouped into sessions, with players able to interact only with other players in his/her assigned session.
In this section, we resample sessions (each session consists of a group of rows, i.e. a collection of individual-round outcomes).
First, specify the number of bootstrapped sets:
num.boot.sets = 2Then, we can construct the bootstrapped data set:
if (runModels) {
samp = matrix(nrow = num.boot.sets, ncol = length(unique(cooperation$sessionnum)))
logit.bootstrap.session = matrix(nrow = num.boot.sets, ncol = 4)
bootstrap.session.beta = array(rep(0, num.boot.sets * 4 * 750), c(750, 4,
num.boot.sets))
for (i in 1:num.boot.sets) {
samp[i, ] <- sample(unique(cooperation$sessionnum), 40, replace = TRUE)
tmp.assign <- do.call(rbind, lapply(samp[i, ], function(x) cooperation[cooperation$sessionnum ==
x, ]))
logit.bootstrap.session.tmp <- glm(decision..0.D.1.C. ~ fluid_dummy *
round_num, data = tmp.assign, family = "binomial")
logit.bootstrap.session[i, ] = logit.bootstrap.session.tmp$coefficients
dat.bootstrap.session.tmp <- list(N = nrow(tmp.assign), p = 4, decision = tmp.assign$decision..0.D.1.C.,
fluid_dummy = tmp.assign$fluid_dummy, round_num = tmp.assign$round_num)
fileName <- "./lib/logit.stan"
stan_code <- readChar(fileName, file.info(fileName)$size)
cat(stan_code)
resStan.bootstrap.session.tmp <- stan(model_code = stan_code, data = dat.bootstrap.session.tmp,
chains = 3, iter = 3000, warmup = 500, thin = 10)
list_of_draws.bootstrap.session.tmp <- extract(resStan.bootstrap.session.tmp)
bootstrap.session.beta[, , i] <- list_of_draws.bootstrap.session.tmp$beta
}
}This table shows the summary of results for bootstrapping.
Posterior distribution draws are as follow:
## , , 1
##
## orig_bayes boot1 boot2
## orig_bayes 0.00000000 0.021056239 0.022523957
## boot1 0.02186232 0.000000000 0.007666615
## boot2 0.02325961 0.007626398 0.000000000
##
## , , 2
##
## orig_bayes boot1 boot2
## orig_bayes 0.00000000 0.09715858 0.09403902
## boot1 0.09552282 0.00000000 0.01746229
## boot2 0.09261596 0.01750814 0.00000000
##
## , , 3
##
## orig_bayes boot1 boot2
## orig_bayes 0.0000000000 0.0003780843 0.0004567395
## boot1 0.0003777219 0.0000000000 0.0002521667
## boot2 0.0004556588 0.0002524070 0.0000000000
##
## , , 4
##
## orig_bayes boot1 boot2
## orig_bayes 0.000000000 0.01300678 0.001687616
## boot1 0.012650247 0.00000000 0.011715751
## boot2 0.001688216 0.01204889 0.000000000
The degree of reproducibility of scientists’ results and findings has increasingly become an important area of interest. In part this is due to the “replication crisis” (for more discussion, see some of the posts on Andrew Gelman’s blog - here and here) and, generally speaking, the idea of reproducible research is hard to argue against. However, what exactly fully reproducible research looks like in practice is still up for debate. In the analysis presented here, we have tried to take steps in this direction by
Stodden et al. (2016) (available here) provide a good discussion of some characteristics of an (ideal) reproducible analysis, including some of those we have tried to incorporate here.
StanLike other probabilistic programming languages, Stan provides built-in computational tools to do inference (e.g. parameter estimation, hypothesis testing) for Bayesian models, which in most cases would otherwise require much more work (i.e. if we had to code the samplers and algorithms by hand). In addition to R (used here), Stan is capable of interfacing with a variety of data analysis languages, such as Python, shell, MATLAB, Julia, and Stata, which helps to make Stan even more accesible. Additionally, Stan models utilize highly efficient sampling algorithms and, in R, can be further investigated with very useful helper packages, such as shinyStan which provides easy access to visual and numerical summaries of model features and diagnostics and rstanarm which provides wrapper functions for many common pre-built Bayesian models.
For more discussion of Bayesian methods for data analysis and best practices in the field, we recommend Gelman et al. (2014).
Gelman, Andrew, John B Carlin, Hal S Stern, and David B Dunson. 2014. Bayesian Data Analysis. Vol. 2.
Rand, David G, Samuel Arbesman, and Nicholas A Christakis. 2011. “Dynamic Social Networks Promote Cooperation in Experiments with Humans.” Proceedings of the National Academy of Sciences 108 (48). National Acad Sciences: 19193–8.
Stodden, Victoria, Marcia McNutt, David H Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A Heroux, John PA Ioannidis, and Michela Taufer. 2016. “Enhancing Reproducibility for Computational Methods.” Science 354 (6317). American Association for the Advancement of Science: 1240–1.